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Abstract. This paper discusses how local measurements of three-dimensional 
positions and surface normals (recorded by a set of tactile sensors, or by three- 
dimensional range sensors), may be used to identify and locate objects, from among 
a set of known objects. The objects are modeled as polyhedra having up to six 
degrees of freedom relative to the sensors. We show that inconsistent hypotheses 
about pairings between sensed points and object surfaces can be discarded efficiently 
by using local constraints on: distances between faces, angles between face normals, 
and angles (relative to the surface normals) of vectors between sensed points. We 
show by simulation and by mathematical bounds that the number of hypotheses 
consistent with these constraints is small. We also show how to recover the position 
and orientation of the object from the sense data. The algorithm’s performance on 
data obtained from a triangulation range sensor is illustrated. 
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1. The Problem and the Approach 


A central characteristic of advanced applications in robotics is the presence of 
significant uncertainty about the identities and positions of objects in the workspace 
of the robot. It is this characteristic that makes sensing of the external environment 
an essential component of robot systems. The process of sensing can be loosely 
divided into two stages: first, the measurements of properties of the objects in the 
environment, and second, the interpretation of those measurements. In the present 
paper, we will concentrate on the interpretation of sensory data. In investigating 
this problem, we make only a few, simple assumptions about available sensory 
measurements, rather than considering specific details of a particular sensor. As a 
consequence, the interpretation technique that is developed here should be applicable 
to a wide range of sensing modalities. As well, the interpretation technique may 
have implications for the design of three-dimensional sensors. 

1.1. Problem Definition 

The specific problem we consider in this paper is to identify an object from 
among a set of known objects and to locate it relative to the sensor. The object 
sensed is assumed to be a single, possibly non-convex, polyhedral object (for which 
we have an accurate geometric model). The object may have up to six degrees 
of freedom relative to the sensor (three translational and three rotational). The 
sensor, which could be tactile or range, is assumed to be capable of providing 
three-dimensional information about the position and local surface orientation of a 
small set of points on the object. Each sensor point is processed to obtain: 

1. Surface points — On the basis of sensor readings, the positions of some 
points on the sensed object can be determined to lie within some small 
volume relative to the sensor. 

2. Surface normals — At the sensed points, the surface normal of the object’s 
surface can be recovered to within some cone of uncertainty. 

Our goal is to use local information about sensed points to determine the set 
of positions and orientations of an object that are consistent with the sensed data. 
If there are no consistent positions and orientations, the object is excluded from 
the set of possible objects. 

In this paper we do not discuss how surface points and normals may be obtained 
from actual sensor data, since this process is highly sensor-dependent (for references 
to existing measurement methods see Section 1.3). Our aim is to show, instead, 
how such data may be used in conjunction with object models to recognize and 
localize objects. The method, in turn, suggests criteria for the design of sensors and 
sensor-processing strategies. 

Our only assumption about the input data is that fairly accurate positions 
/*"\ of surface points are obtainable from the sensor, but that significant errors exist 

in determining normal information. This assumption reflects the type of data 
obtainable from tactile sensors. Range sensors based on triangulation can be used 
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to obtain high quality measurements of normals from patches of depth data. The 
availability of good normal data merely increases the efficiency of the method. 

1.2. Approach 

A recent paper [Gaston and Lozano-Perez 83] introduced a new approach to 
tactile recognition and localization for polyhedra with three degrees of positional 
freedom (two translational and one rotational). The present paper generalizes that 
approach to polyhedra with six degrees of positional freedom. The inputs to the 
recognition process are: a set of sensed points and normals, and a set of geometric 
object models for the known objects. The recognition process, as outlined in the 
earlier paper, proceeds in two steps: 

1 . Generate Feasible Interpretations: A set of feasible interpretations of the 
sense data is constructed. Interpretations consist of pairings of each sensed 
point with some object surface of one of the known objects. Interpretations 
inconsistent with local constraints (derived from the model) on the sense 
data are discarded. 

2. Model Test: The feasible interpretations are tested for consistency with 
surface equations obtained from the object models. An interpretation is 
legal if it is possible to solve for a rotation and translation that would 
place each sense point on an object surface. The sensed point must lie 
inside the object face, not just on the surface. 

The first step is the key to this process. The number of possible interpretations 
given s sensed points and n surfaces is n s . Therefore, it is not feasible to carry out 
a model test on all possible interpretations. The goal of the recognition algorithm is 
to exploit the local constraints on the sensed data so as to minimize the number of 
interpretations that need testing. This approach is an instance of a classic paradigm 
of artificial intelligence: generate and test; see for example [Buchanan, et al. 69]. 

Consider a simple example of the approach, illustrated in Figure 1. The model 
is a right triangle, with edge sizes of 3, 4, and 5 respectively. From this model, we 
can construct a table of ranges of distances between pairs of points on the edges. 
The table is as follows: 



Distance Ranges Between Edges 


1 

2 

3 

1 

[0,3] 

[0,5] 

[0,4] 

2 

[0,5] 

[0,4] 

[0.31 

3 

[0,4] 

[0,3] 

[0,5] 


Now, suppose we know the positions of the three sensed points, Pi through P 3 , shown 
in Figure 1. The measured distances between those points are dist[P\, P 2 ) — 3.5, 
dist(Pi,Ps) = 4.4, dist[P 2 ,Ps ) = 0.8. From this we see that any interpretation of 
the sensed points that assigns Pi and P 2 both to edge 1 is inconsistent with the 
model. Similarly, assigning Pi and P 2 to edges 2 and 3 is not consistent. Many other 
pairwise assignments of points to edges can be discarded simply by comparing the 
measured distances to the ranges in the table. Note that the sensed positions are 




Grimson & Lozano-P£rez 


Model-Based Recognition 


Figure 1. An example of the approach 




subject to error, so that a range of actual distances is consistent with the measured 
positions. It is these distance ranges that must be compared against the ranges 
in the table. For this example, only 6 of the 27 possible assignments of the three 
points to the three model edges are legal. 

Of the six interpretations consistent with the distance ranges, the two shown 
in Figure 1, are completely consistent once the line equations of the edges are taken 
into account. Each of these interpretations leads to a solution for the position and 
orientation of the triangle relative to the sensor. Furthermore, these positions and 
orientations of the triangle place the measured points inside the finite edges, not 
just on the infinite line. 

This paper discusses both steps of the recognition process, focusing first 
on the generate step and then considering the model testing stage. We show, by 
mathematical analysis and by simulation, that the number of feasible interpretations 
can be reduced to manageable numbers by the use of local geometric constraints. 
In particular, we investigate the effectiveness of the different local constraints and 
the impact of measurement errors on their effectiveness. We further show that the 
few remaining feasible interpretations can efficiently be subjected to an explicit 
model test, generally resulting in a single interpretation of the sense data (up to 
symmetries). We also illustrate the performance of the algorithm on range data 
obtained by triangulation. 
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1.3. Three Dimensional Sensing 

Sensors can be roughly divided into two categories: non-contact and contact. 
Non-contact sensing, especially visual sensing, has received extensive attention in 
the robotics and artificial intelligence literature. Contact sensing, such as tactile or 
haptic sensing, plays an equally important role in robotics, but has received much 
less attention. In this paper, our aim is to develop a sensory interpretation method 
that is applicable to data from both contact and non-contact sensors. 

While two-dimensional sensing, for example silhouette or binary vision, may 
be adequate for restricted situations such as problems with three degrees of freedom 
in positioning, the general localization and recognition problem requires three- 
dimensional sensing. Throughout this paper, we will concentrate on the six-degree 
of freedom recognition and localization problem and the use of three-dimensional 
sensing. Restrictions of the method to the simpler case of three degrees of freedom 
are straightforward. 

1.3.1. Previous Work in Visual Range Sensing 

The measurement stage of visual sensing has received extensive attention in the 
literature. Of particular interest here are methods for obtaining three-dimensional 
position and surface normal information; see [Jarvis 83] for a detailed survey. 
Possible methods include edge-based stereo systems [e.g. Grimson 81, Baker and 
Binford 81, Mayhew and Frisby, 81], which provide three-dimensional positions of 
sparse sets of points in the image. This sparse data can be used to reconstruct a 
dense surface representation, from which surface normals can be estimated [Grimson 
82, 83; Terzopoulos 83]. Other methods for obtaining three-dimensional positions 
are laser range-finding [e.g. Nitzan, Brain, and Duda 77, Lewis and Johnston 77] 
and structured-light systems [e.g. Shirai and Suwa 71, Popplestone, et al. 75]. 
Many other visual processes can be used to obtain surface normal information 
directly, e.g., photometric stereo [e.g. Woodham 78, 80, 81, Ikeuchi and Horn, 79] 
and texture gradients [Bajcsy 73, Bajscy and Liebermann 76, Kender 80, Stevens 
80]. In fact, there is no constraint that the sensory data for one problem must come 
from one sensory modality. Data from visual sensors and tactile sensors may be 
combined in one run of the algorithm. 

The interpretation stage of visual recognition has received less attention, 
especially when dealing with three-dimensional objects with six degrees of freedom. 
Much of the previous work in the area of interpretation of three-dimensional data 
has focused on the recognition of simple generic objects such as planar patches, 
regular polyhedra, generalized cylinders, and spheres [e.g., Shirai and Suwa 71, 
Popplestone, et al, 75, Nitzan, Brain, and Duda 77, Oshima and Shirai 78, 
Faugeras, et al. 83, Agin and Binford 73, Nevatia and Binford 77]. Some authors 
have examined the problem we deal with here of recognizing specific objects from 
three-dimensional data [e.g., Shneier 79, Sugihara 79, Oshima and Shirai 83, Bolles, 
Horaud, and Hannah 83, Brou 83, Ikeuchi, et al. 83]. The principal difference 
between previous work on recognition and the approach described here is our 
reliance on sparse data acquired at points. This makes our approach adaptable to 
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contact sensing as well as visual sensing. The sparseness of the data does make the 
problem of segmentation, determining which data is drawn from which objects in 
a scene, more difficult. Further research on this topic is currently underway. 

In the final stages of preparing this paper, we became aware of the work of 
Faugeras and Hebert [83] which adopts an approach that is similar in many respects 
to the one described here. Their work, however, focuses on deriving an accurate 
model test. Their method does not emphasize the problem of enumerating all the 
legal interpretations of the data. Instead, a measure of the accuracy of the model 
test (and a simple angle pruning heuristic) is used to drive a best-first search for 
a good interpretation. This method does not ensure that the interpretation found 
is the only one consistent with the data, however. Their method and ours are 
complementary in this respect. Their approach also does not assume sparse data, 
but it is in fact applicable to that problem. 

1.3.2. Previous Work in Tactile Sensing 

Contact sensors measure the locus of contact and the forces generated when 
in contact with an object. We make the distinction between tactile sensors, which 
measure forces over small areas, such as a fingertip, and force sensors, which 
measure the resultant forces and torques on some larger structure, such as a 
complete gripper. A micro-switch, for example, can serve as a simple tactile sensor 
capable of detecting when the force over a small area, e.g. an elevator button, 
exceeds some threshold. The most important type of tactile sensors axe the matrix 
tactile sensors, composed of an array of sensitive points. The simplest example 
of a matrix tactile sensor is an array of micro-switches. Much more sophisticated 
tactile sensors, with much higher spatial and force resolution, have been designed; 
see [Harmon 82] for a review and [Hillis 82, Overton and Williams 83, Purbrick 81, 
Raibert and Tanner 82, Schneiter 82] for some recent designs. 

For descriptions of previous work in tactile sensing, we refer the reader to 
two very thorough surveys by Harmon [80, 82]. A more detailed discussion of 
previous work on tactile recognition can be found in [Gaston and Lozano-Perez 83]. 
In this section, we briefly survey the two major alternative approaches to tactile 
recognition: statistical pattern recognition, and description -building and matching. 

Much of the existing work on tactile recognition has been based on statistical 
pattern recognition or classification. Some researchers have used pressure patterns 
on matrix sensors primarily [Briot 79, Okada and Tsuchiya 77]. Others have used 
the joint angles of fingers grasping the object as their data [Briot, Renaud, and 
Stojilkovic 78, Marik 81, Okada and Tsuchiya 77, Stojilkovic and Saletic 75]. A 
related approach uses the pattern of activation of on-off contacts placed on the 
finger links [Kinoshita, Aida, and Mori 75]. 

The range of possible contact patterns between multiple sensors and complex 
objects is highly variable and seems to require detailed geometric analysis. Tactile 
/""s recognition methods based on statistical pattern recognition are limited to dealing 

with simple objects because they do not exploit the rich geometric data available 
from object models. 
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Several proposed recognition methods build a partial description of the object 
from the sense data and match this description to the model. One approach emulates 
the feature-based descriptions in vision systems, for example, identification of holes, 
edges, vertices, pits, and burrs [Binford 72, Hillis 82, Snyder and St. Clair 78]. 
Another approach is to build surface models, either from pressure distributions 
on matrix sensors [Overton and Williams 81], or from the displacements of an 
array of needle-like sensors [Page, Pugh, and Heginbotham 76, Takeda 74]. A 
related approach builds a representation of an object’s cross section [Ozaki et al 
82, Kinoshita, Aida, Mori 75]. 

Description-based methods are more general than statistical methods but must 
solve two formidable problems: building accurate object descriptions from tactile 
data, and matching the descriptions to the models. One major difficulty is that 
existing sensors do not have the spatial or force resolution needed to build nearly 
complete object descriptions. Furthermore, there are few methods for matching 
the partial descriptions obtainable from tactile sensors to object models. In our 
opinion, part of the problem in tactile data interpretation has been the tendency 
to adapt the techniques developed for two-dimensional vision, where dense data is 
readily obtainable, to tactile data, which is naturally sparse. 

One lesson from the simulations described later is that some estimate of surface 
normal is an extremely powerful constraint on recognition and localization. The 
estimate need not be very tight for performance to improve drastically. There has 
been little previous emphasis on measuring surface normals with tactile sensors. 
Accuracy in measuring normals requires some attention to engineering tradeoffs 
in sensor design, especially the sensor stiffness. In a stiff sensor (one that deforms 
very little under contact), the normal to the sensor surface at the point of contact 
directly gives an estimate of the object’s surface normal. So, a stiff sensor with high 
spatial resolution can be used to measure normals. In a soft sensor, the pattern of 
forces can be analyzed to determine the shape of the object surface. So, a soft sensor 
with good force measurement accuracy can also be used. Today, it is probably 
easier to build stiff sensors with poor force resolution than soft sensors with good 
force resolution [Snyder and St. Clair 78]. This argues that a stiff VLSI sensor [e.g. 
Raibert and Tanner 82] may be acceptable. Another factor is that the method used 
here, since it is based on local information, does not require large sensor areas; it 
can function better with many small sensors. 

The approach used in this paper is an instance of a description-based 
recognition method. The basic departure from previous methods is the reliance on 
sparse three-dimensional positions and surface normals obtained at points 1 . This 
contrasts with the dense area data needed in global feature-based or surface-based 
description methods. The point-based data we use is more readily obtainable 
from simple tactile sensors and the process of matching it to models is relatively 
straightforward. Therefore, the method described here could be a powerful addition 
to approaches based on more complete descriptions. 

Wery different approaches to tactile recognition based on this type of data are outlined in [Dixon, 
Salazar, and Slagle 79, Ivancevic 74]. 
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2. Generating Feasible Interpretations 


f \ 

After sensing an object, we have the positions of up to s points, P{, known to 
be on the surface of one of the m known objects, Oy, having ny faces. The range of 
possible pairings of sensed points and model faces for one object can be cast in the 
form of an interpretation tree (IT) [Gaston and Lozano-Perez 83]. The root node 
of the ITy, for object Oy, has ny descendants, each representing an interpretation 
in which P\ is on a different face of Oy. There are a total of s levels in the tree, 
level i indicating the possible pairings of P l with the faces of object Oy (see Figure 
2). Note that there may be multiple points on a single face, so that the number of 
branches remains constant at all levels. 

A k-interpretation is any path from the root node to a node at level k in the 
IT; it is a list of k pairings of points and faces. The set of IT’s contains a very large 
number of possible s-interpretations 

m 

EK) S - 

3=1 

In an object with symmetries, of course, the IT is highly redundant [Gaston and 
Lozano-Perez 83]. The m IT’s, one for each known object, represent the search 
space for the recognition problem discussed here. 

2.1. Pruning the IT by Local Constraints 

Only a very few interpretations in an IT are consistent with the input data. 
We can exploit the following local constraints to prune inconsistent interpretations: 
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1 . Distance Constraint — The distance between each pair of P t ’s must be a 
possible distance between the faces paired with them in an interpretation. 

2. Angle Constraint — The range of possible angles between measured 
normals at each pair of P z ’s must include the known angle between surface 
normals of the faces paired with them in an interpretation. 

3. Direction Constraint — The range of values for the component of a vector 
between sensed points [P z i—> Pj) in the direction of the sensed normal at 
P z and at Pj must intersect the range of components of possible vectors 
between points on the faces assigned to P z and Pj by the interpretation. 

These constraints typically serve to prune most of the non-symmetric s- 
interpretations of the data. Other constraints are possible, for example, the area of 
the triangle defined by three sensed points must be contained within the range of 
areas defined by the faces paired with them, and the pairing of sensed points with 
faces must not be such as to require that the path of the sensor (beam) pass through 
some portion of the object before sensing that face [Gaston and Lozano-Perez 83]. 
We will focus on the three constraints above, primarily because they are simple to 
implement while being quite effective. Moreover, they capture all the constraints 
between pairs of points. 

Note that the distance, angle, and direction constraints can be used to prune 
/^-interpretations, for k > 2, thereby collapsing whole subtrees of the IT. This is a 
crucial point, worth dwelling on for a moment. 

Recall that the overall problem we are considering is to determine the position 
and orientation of an object, using sparse sensory data. In principle, one could 
consider all possible interpretations of the data, and for each one, determine 
whether there is a transformation from model coordinates to sensor coordinates 
that would account for the sensory data. Unfortunately, this is computationally 
extremely expensive. In order to compute such a model test, we need three points, 
whose corresponding face normals are linearly independent, as well as the measured 
normals at those points. Clearly, we would in general need k sensory points to 
ensure this, where k > 3. Thus, if n is the number of faces in the object, we would 
need to consider on the order of n k model tests, each of which requires considerable 
computational effort. 

On the other hand, using the simple geometric constraints outlined above 
requires only a straightforward table lookup, and, as we shall see, can drastically 
reduce the number of interpretations to which a model test must be applied. Since 
the constraints can be applied near the root of the tree, it is possible to prune 
whole subtrees from the IT, at virtually no computational expense. 

We consider each of the constraints in more detail below. 

2.1.1. Distance Pruning 

If an interpretation calls for pairing two of the sensed points with two object 
faces, the distance between the sensed points must be within the range of distances 
between the faces (see also [Bolles and Cain 82]). Note that the distances between 
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all pairs of sensed points must be consistent, i.e., there are three distances between 
three sensed points, and in general distances between k sensed points. Because 
of this, the distance constraint typically becomes more effective as more sensed 
points are considered. 

Given two faces on a three-dimensional object, we can compute the range of 
distances between points on the faces. The minimum distance may be determined 
as the minimum of the shortest distance between all pairs of edges and the 
perpendicular distances between vertices of one face and the plane of the other 
face (when the vertex projects inside the face polygon). The maximum requires 
examining distances between pairs of vertices. Note that we can also compute the 
range of distances between points on one face (zero up to the diameter of the 
face). Sophisticated algorithms may be used to reduce the complexity of these 
computations, but since they are to be performed off-line, once for each model, 
their efficiency is not critical to the approach. 

The distance constraint can be implemented in the following manner. For 
object Oy, with /y faces, we construct an /y by /y table, whose entries determine the 
range of possible distances between pairs of faces. In particular, for a pair of faces 
(i, k ), i k, the maximum distance between the faces is stored in table location 
dtable J [max(f, k), min(i, k,)] and the minimum distance between the faces is stored in 
table location dtabley [min(z, k), max(i, /c)]. If i = k, we simply store the maximum 
distance in the diagonal entry dtabley [z, i), since the minimum distance defaults 
to 0. This representation makes checking a distance constraint straightforward, 
since the set of all pairs of faces (i, k ) on object Oy consistent with some measured 
distance d is given by 

i, k ) | dtable^ [min(z, k), max(i, k)] <. d < dtabley [max(z, k ), min(z, k 

plus the pair (i,i) if d < dtabley [i, z]. 

Given any k — 1 -interpretation, represented by the set of faces (z'i,.. 
and a new k th sensed point, the generation of the next level of the IT below this 
interpretation can be easily computed by checking the appropriate portions of the 
distance tables. In particular, if the measured distance between one of the previous 
sensed points, it, and the new one is given by di t , the set of possible faces that can 
be assigned to sensed point Pfc is given by 



p| < i | dtabley [min(z, ii), max(z, z^)] < d tt < dtabley [max(z, it), min(z, ii 
l=i *■ 

unioned with the set 

Jfc-i 
1=1 

For very complex objects, much more time efficient ways of representing and 
searching for faces that satisfy a distance constraint are possible. A full discussion 
of these methods is beyond the scope of this paper, however. 




%l | 0 < di L < dtable 


'[Hi ^]|< 
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We note that it may frequently be the case, e.g. for a flat tactile sensor, that 
the sensor makes contact along an edge or at a vertex, rather than in the interior 
of a face. The method described above would still work unchanged under these 
circumstances. But if the sensor is capable of detecting that contact is at a vertex or 
edge, then tighter constraints can be applied. This is accomplished by constructing 
tables of distance ranges between vertices and between edges and applying the 
pruning algorithm based on those tables when appropriate. 

Similarly, in the case of visual sensing, if the edges and vertices of an object 
can be reliably determined from the sense data, the recognition process is greatly 
simplified. (Note the relationship to the recognition method used in [Bolles, et al. 
82 ].) 

2.1.2. Angle Pruning 

Sensed points are associated with a range of legal surface normals consistent 
with the sensory data. If an interpretation calls for pairing two of the sensed points 
(and normals) with two object faces, the range of angles between the sensed normals 
must include the angle between the normals of the corresponding object faces. 

To see how this information can be used to prune the IT, we first consider 
the case in which the object has three degrees of freedom (two translation and 
one rotational). Under this restriction on degrees of freedom, the range of surface 
normals can be represented as a range of angles relative to the hand frame. 

At a sensed point P, we can measure the local surface normal as lying in the 
range of angles 

<j> G [w — e, uj -f- e] 

where uj is the actual measurement, and c defines the range of possible angles about 
this measurement. We are given a sensor point Pi, with measured normal uj\, which 
has been assigned to face i, with associated model coordinate surface normal given 
by 'ipi- Next, we record a second point P 2 , with measured normal u> 2 , which has 
been assigned to face k, with associated model coordinate surface normal given 
by ipk- For these assignments to be consistent, it must be the case that the angle 
between the model faces must be included in the range of angles between the ranges 
of normals determined from the measured normals and the error bounds 

(u>2 — wi) — (ei -j- £ 2 ) < V’fc — V*i < ( w 2 — ^i) -f- (ei -J- € 2 ). 

It is clear that an implementation similar to that used for distance pruning will also 
suffice here. For object Oy, with ey edges, we can set up an ey by ey, lower diagonal 
table atabley such that atabley [max(i, k), min(i, k)] = ipk — 4>i- This representation 
makes checking a surface normal constraint straightforward, since the set of all 
pairs of faces (i, k ) on object Oj consistent with some measured ranges of surface 
normals is given by 

k) | (o >2 — uj\) — (ei -j- 62 ) < atabley[max(i, k ), min(i, /c)] < [uj 2 — u>i) -j- (ei + 62 
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Figure 3. Angle Ranges 



Given any k — 1-interpretation, and a new k th sensed point, the generation of the 
next level of the IT below this interpretation can easily be computed by checking the 
appropriate portions of the angle tables. Note that the k th edge must be consistent 
with the angles between all previous faces. 

In the two-dimensional (three degree of freedom) case, the range of possible 
surface normals at a sensed point was represented by the pair (o»i,ei) where uq 
denoted the sensed normal, and ei denoted the range of error about that sensed 
point. In three dimensions, the obvious generalization is to use angle cones, so that 
if Ui denotes the unit sensed surface normal, the range of possible values for the 
actual surface normal will be denoted by the right circular cone 

{m | Hi • ui > d}. 

We could proceed identically to the two-dimensional case by noting that the cone of 
sensed normals constrains the set of possible three-dimensional rotations between 
the hand and model coordinate systems. Then, given a second sensed point P 2 with 
some sensed normal, the set of feasible faces would be restricted by the range of 
possible rotations. This method is quite difficult to implement, however. There is a 
much simpler alternative method. 

Suppose that at the second sensed point, the set of possible surface normals in 
hand coordinates is given by 

{n 2 | n 2 -u 2 > e 2 }. 

Then, in order for faces i and k, with associated surface normals v* and Vfc to be 
consistent it must be the case that 

Vj • v fc £ (ni • n 2 | ni • ui > ci, n 2 • u 2 > e 2 }. 

We can rephrase this in the following manner. Let cosai == ci, cosa 2 = e 2 , 
an — a\ -f- a 2 and cos 7 i 2 — ui • u 2 . Then, we claim that the set 
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Figure 4. Range of Directions between Sensed Points 




{ni • n 2 | ni • ui > ex, n 2 • u 2 > e 2 } 

is contained in the set 

{“1 • n 2 | cos [min(7r, 7 j 2 + ai 2 )] < m ■ n 2 < cos [max(0, 7 i2 — ai 2 )j}. (l) 

A proof of this is found in Appendix I. Figure 3 illustrates this result in two 
dimensions. 

An implementation of angle pruning similar to that used for distance pruning 
is now also possible. For object Oy, with /y faces, we can set up an /y by /y, lower 
diagonal table atabley such that atabley[max(i, k), min(i, k)] = v t - • Vfc, where recall 
that Vj denotes the unit normal to face i in the model. 

2.1.3. Direction Pruning 

Consider a pair of sensed points Pi and P 2 and let u i2 be the unit direction 
vector between them. Suppose that we know the measured surface normal at point 
Pi to within some cone of error, for example, the measured value is wi, and the 
range of possible values for the surface normal is 

{vi | vi -wx > ex}. 

Then the set of possible “angles” between the direction vector and the surface 
normal of the face is given by 

{vi • u i2 | vi - wx > ex}. 

In an interpretation, suppose that point Pi has been assigned to face i, with 
normal n; in the model, and we now consider possible faces k to assign to point P 2 . 
Let the range of possible unit vectors (directions) from face i to face k be denoted 
by the cone 
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for some pair t lk and 6 t k- Figure 4 illustrates this cone in a two-dimensional 
example. Appendix II shows how this cone may be computed from models of the 
object faces. In the model, the set of possible angles between legal directions and 
the surface normal is 

{n 2 ' • Sjfc | Sj^ • tjfc (2) 

Thus, assume that point P\ is on face i, with normal n^, that we have measured 
Wi, that we know e\, and that we have also measured P 2 . A face k, whose direction 
range from face i is given by the pair (t^, S tk ), is a feasible face for point P 2 if the 
set in equation ( 2 ) intersects the cone 

{vi • U 12 | vi • wi > ei}. (3) 

If cos 7 ik — Sik, and cos <f> lk == n* -t,^, then we know from the derivation in Appendix 
I that the set of equation ( 2 ) is contained in the set 

{n* • s ik | cos( 7 ik -f (f ) ik ) < n, • s tk < cos( 7 ^ — <f> ik )}. 

Similarly, if cos on = ei and cos 072 — Vi • U 12 , then the set of equation (3) is 
contained in the set 

{vi • U 12 | cos(ai + W 12 ) < vi ■ U 12 < cos(ai — w i2 )}. 

Therefore, for the pairings of P\ with face i and P 2 with face k to be consistent with 
the direction constraint, it must be the case that the intersection of the numerical 
ranges of dot products is not null, i.e., 

[cos(ai — W12), cos(ai -f W12)] f) [ cos ( 7 ifc — <f>ik), cosfafc + <t>ik)\ 7^ 0 

The direction constraint can also be implemented in a form similar to that used 
for distance and angle pruning. For object Oj, with fj faces, we can set up an fj 
by fj table ctabley such that ctabley[z, k\ — [cos( 7 jjt — cos {lik ~b <f>ik)]- Again, 
the set of all pairs of faces (i, k ) on object Oj consistent with some measured ranges 
of surface normals is given by 

i, k ) | [cos(ai — W 12 ), cos(ai -f- W 12 )] Q ctabley [z, k] 0 

Note that the direction constraint is not symmetric, as are the distance and 
angle constraints, so before pairing P 2 to face k, we must repeat the test above 
interchanging the roles of i and k. Similarly, the test must be applied to each 
pairing of sensed points and faces in an interpretation. 

The constraint described above places constraints on the angle between a surface 
normal and unit vectors from one face to another. In addition to constraining the 
angles of unit vectors, we may constrain the magnitude of the component along 
the surface normal of the vector between the sensed points. The statement and 
implementation of the constraint is essentially unchanged, except that U 12 and t i k 
are no longer unit vectors but the actual vector between the sensed points. The 
effectiveness of the constraint is in general improved, however, since it now captures 
some distance and some angular constraint. The difference between this extended 
direction constraint and the simple direction constraint is illustrated in Figure 5. 
Two parallel faces (faces 1 and 2 in the figure) displaced relative to each other 
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Figure 5. Extended Direction Constraint 



give rise to a cone of directions, but a single value for the normal component of 
vectors connecting the faces. Note that an interpretatation that assigns P* to face 1 
and P 2 to face 3 is consistent with all the previously mentioned constraints except 
for the extended direction constraint. The figure also illustrates that the extended 
direction constraint does not subsume the distance constraint, since direction only 
constrains the normal component of distance. 

There is an alternate form of the direction constraint, useful when no bound 
on the surface normal is available. It can briefly be described as follows. Given two 
faces h and i on an object, we can compute the range of directions between'points 
on the faces, forming a cone of possible directions. Similarly for faces i and j, we 
can compute the cone of possible directions. The combination of these two cones 
defines a range of possible angles for the triplet of faces h,i,j. 

If an interpretation calls for pairing three of the sensed points with three object 
faces, the angle formed by this triplet of sensed points must be within the range 
of possible angles between the triplets of faces. Note that the angles formed by all 
triplets of sensed points must be consistent, i.e. for three sensed points, there are 
three angles, for k sensed points, there are angles. Hence, this constraint also 
becomes more effective as more sensed points are considered. 

This form of the direction constraint can be used when only vertices and edges 
are touched, as it does not require sensing surface normals. Note that this form of 
the constraint can also be extended to use the magnitude of the vectors between 
sensed points as well as their direction. This form of the direction constraint allows 
pruning of the IT for k > 3. The previous formulation of the constraint allows 
pruning of the IT for /c > 2. As well, this form of the constraint would require an 
n 3 table, as opposed to an n 2 one for the previous formulation. Given the size of n 
to be expected for typical objects, this is a critical difference. 
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3. Model Testing 

Once the interpretation tree has been pruned by the local constraints, there 
will be some set of possible interpretations of the sensed data, each one consisting 
of a set of triples (pi,nj,/i), where p t is the vector representing the sensed position, 
n t is the vector representing the sensed normal, and /. is the face assigned to 
this sensed data for that particular interpretation. In the model test stage of the 
processing, we want to 

1 . determine the actual transformation from model coordinates to sensor 
coordinates, corresponding to the interpretation, 

2 . check that under this transformation, not only are the sensed points 
transformed to lie on the appropriate planes, but moreover, that the 
sensed points actually lie within the bounds of the assigned faces. 

We will assume that a vector in the model coordinate system is transformed 
into a vector in the sensor coordinate system by the following transformation: 

v s = Rv m + v 0 

where R is a rotation matrix, and vo is some translation vector. We need to solve 
for R and vo. We note that a solution could be obtained using a least-squares 
method, such as is used by [Faugeras and Hebert 83]. This type of solution can be 
computationally expensive, however, and in the following sections, we develop an 
alternative method. 

3.1. Rotation Component 

We consider first the rotation component of the transformation. Consider the 
first triple of a particular interpretation, (p;, n,-, /;). The sensed normal is given by v* 
and corresponding to face /,• is a face normal m^. For R to be a legitimate rotation, 
it should take the normal m* into n; (ignoring issues of error in the measurements 
for now). 

Now, any rotation can be represented by a direction about which the rotation 
takes place, and an angle of rotation about that direction. What is the set of possible 
directions of rotation r consistent with and m^? Any rotation will preserve the 
angle between the transformed vector and the direction of rotation. Hence, any 
legitimate rotation direction must be equiangular with Uj and mi. Thus, the set of 
potential directions is given by 



or equivalently 



That is, Tij is perpendicular to (m,- — ni). 
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Now, consider a second triple in the interpretation, (py,ny,/y) and let my be 
the normal to face /y. Provided my ^ zL m i and n i — m z is not (anti-)parallel to 
ny — m ; , we can constrain r,y to a second set 



Since the rotation is the same, r tJ must lie in both sets, i.e., it must be perpendicular 
to both vectors. Hence, ryy is given by the unit vector in the direction 

(m, — n*) X (my — ny) 
to within an ambiguity of 180°. 

This derivation can be recast in geometric terms in the following manner. Any 
unit rotation vector r taking m* into n; must lie on the perpendicular bisector of the 
line connecting n t to m t . Similarly, it must also lie on the perpendicular bisector 
of the line connecting ny to my. Since the rotation is the same, it must lie in the 
intersection of the two perpendicular bisector planes, as above, and hence is given 
by the specified unit vector 

(mi — nj X (my — ny). 

If there were no error in the sensed normals, we would be done. With error included 
in the measurements, however, the computed rotation direction r could be slightly 
wrong. One way to reduce the effect of this error is to compute all possible r*y as 
i and j vary over the faces of the interpretation, and then cluster these computed 
directions to determine a value for the direction of rotation r. 


Once we have computed a direction of rotation r, we need to determine the 
angle 9 of rotation about it. It is straightforward to show that (see, for example, 
[Korn and Korn, 68] p. 473) 

m; = cos Oni -j- (1 — cos 0)(r • n;)r -f- sin 9 (t X nj). 

Simple algebraic manipulation, using the fact that r • m 4 = r • n^, yields 


cos 9 
sin 9 


1 _ 1 ~ ( n t • m i) 

1 — (r • n,-)(r * mi) 
(r X Pj) • nij 
1 ~ (r • ni)(r • mi) * 


Hence, given r, we can solve for 9. Note that if sin0 is zero, there is a singularity 
in determining 9, which could be either 0 or tt. In this case, however, r lies in the 
plane spanned by n; and m* and hence, only the 9 — tv solution is valid. 


As before, in the presence of error, we may want to cluster the r vectors, and 
then take the average of the computed values of 9 over this cluster. 


Finally, given values for both r and 9, we can determine the rotation matrix 
R. Let r x ,r y , r z denote the components r. Then 



1 

0 

O' 



' r\ r x r y r x r z ' 


' 0 

—r z 

r y 

R — cos 9 

0 

1 

0 

H~ (i - 

- cos 9) 

r y r x r\ r y r z 

4* sin 9 

r z 

0 

—r x 


.0 

0 

1. 



r z r x r z r y r 2 z . 


~ r v 

r x 

0 
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r^. 


Note that in computing the rotation component of the transformation, we have 
ignored the ambiguity inherent in the computation. That is, there are two solutions 
to the problem, (r, 6) and (—r, — 0). We assume that a simple convention concerning 
the sign of the rotation is used to choose one of the two solutions. 

3.2. Translation Component 

Next, we need to solve for the translation component of the transformation. 
We know that v s = Rv m + Vo, where v m is a vector in model coordinates, v s is the 
corresponding vector in sensor coordinates, and R has been computed as above. 
Given a triple (pi,n;,/ t ) from the interpretation, let irq be the normal of face fi, 
with offset di, that is, the face is defined by the set of vectors 

{v | v • m t = di}. 

Then the point in model coordinates corresponding to pi is 

— v 0 ) 

and the following equation holds 

m t - • (R —1 (pi — v 0 )) = di 

or equivalently 

(Rn»i) • (pi — v 0 ) = di. 

This equation essentially constrains the component of the translation vector in the 
direction of Rmi. 

Suppose we consider three triplets from the interpretation, (pi, n t , fi), (py, ny, fj), 
and (p k, n k>fk) such that the triple product mi • (my X m^) is non-zero, (i.e. the 
three face normals are independent). Then, we can construct three independent 
equations 

(flmi) • vo — (-Rmi) • pi — di 
(Rmy) • vo = (Rmy) • py — dj 

{Rm k ) • v 0 = {Rm k ) • p fc — d k . 

Each of these equations constrains a different, independent component of the 
translation vector vq, and hence the three equations together determine the actual 
vector. Straightforward algebraic manipulation then yields the following solution 
for the translation component vo: 

m* * (my X mjfc) v 0 =((Rmi) • pi — 4)((Rmy) X (R m fc )) 

+ ((Rmy) ■ py — djj((Rm k ) X (Rmi)) 

-f ((Rm*) • p k — 4)((Rmi) X (Rmy)) 

As in the case of rotation, if there is no error in the measurements, then we are done. 
The simplest means of attempting to reduce the effects of error on the computation 
is to average vq over all possible trios of triplets from the interpretation. Note that 
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for numerical stability, one may want to restrict the computation to triplets such 
that m t • (my X rn^) is greater than some threshold. 

Finally, we have computed the transform ( R,yo ) from model coordinates 
to sensor coordinates. To check a possible interpretation, we consider all triples 
(p;,n i>fi) i n the interpretation and compute 

— v 0 ). 

We then check that this point lies within the bounds of face f t (to within some 
error range). If it does not, then the interpretation is invalid, and may be pruned. 
If all such triples satisfy this check, the interpretation is still valid. 

We have assumed above that three independent face normals have been 
measured. When only one normal is available, neither the rotation or translation 
can be determined. When only two independent normals are available, the rotation 
can be determined as before, but only a direction of translation can be determined, 
not the actual magnitude of the translation. A range of possible translations can 
be determined, however, by interesecting the line, determined by the position of a 
sensed point and the translation direction, with the face- assigned to the point by 
the interpretation. Of course, further sensing along this line to discover the position 
of the edge would determine the actual translation. 

After the model test has been applied to all leaves of the interpretation tree, 
there may still be several interpretations remaining. Upon examination, one usually 
finds that these interpretations differ only in the assignment of one or two faces, 
all other faces being identical. This inability to distinguish between such nearly 
identical interpretations is a result of the error bounds on the sensing. Thus, as 
a final stage, we cluster the remaining interpretations in terms of their computed 
transformations, that is, we cluster the interpretations in terms of the computed 
orientation of the object in space. Here, we generally find very few such clusters. 
Indeed, in general there is only one computed orientation for the object, (the correct 
one), although occasionally two or more clusters survive, usually corresponding to 
symmetric interpretations of the sensed data. 

4. Simulation Data 

In order to test the efficacy of the algorithm in pruning the interpretation 
tree, we ran a large number of simulations. Some simulations for objects with three 
degrees of freedom (two translational and one rotational) have been described in 
[Gaston and Lozano-Perez 83]. We include additional simulation data for objects 
with three positional freedoms, including the direction constraint. We also provide 
data for the more general case of three-dimensional objects with six degrees of 
freedom. 

Our goals are first to demonstrate that effective pruning of the interpretation 
tree is possible, at low computational expense, and second to explore the sensitivity 
of the algorithm to error in measuring the surface normal and the position of the 
sensed points. 
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Figure 6. 2D Test Models 



4.1. Three Positional Freedoms 

We begin by considering objects with two degrees of translational freedom 
and one degree of rotational freedom, using sample objects first considered in 
[Gaston and Lozano-Perez 83], illustrated in Figure 6 . The addition of the direction 
constraint greatly reduces the extent of the set of possible interpretations. To 
demonstrate this, a series of 250 runs of the algorithm was executed for each of the 
objects. Each run determined the number of interpretations consistent with a set of 
5 sensed points. The points were determined by first randomly rotating the object 
about its centroid and then intersecting the object with five lines from its centroid 
along five evenly spaced directions. The points of intersection farthest from the 
centroid along each line were used as the sensed point. The (simulated) error in 
measuring the sensed position was bounded by 0.1 (i.e. a randomly oriented offset 
vector of random magnitude bounded by 0.1 was added to the point on the object), 
and the (simulated) error in measuring the angle of the surface normal was f (i.e. a 
random vector was chosen whose dot product with the actual normal was bounded 
by cos -1 !). To place these error ranges in perspective, the diameters of the models 
in Figure 6 were 9,14 and 12 units for the wrench, gator and hand respectively. 

The following table describes the results of this set of simulations, by 
histogramming the number of interpretations found. Thus, for i < 10, the number 
in the i th column is the number of trial runs which resulted in i possible 
interpretations. Beyond this point, the histogram is compressed into units of tens. 
For example, the column labelled 20 lists the number of trial runs resulting in k 
interpretations, where 10 < k < 20. In order to examine the effectiveness of adding 
the direction constraint to the algorithm described in [Gaston and Lozano-Perez 
83], the simulations were run both with and without this constraint. For each object 
in the table, the first histogram corresponds to the case of using the direction 
constraint, and the second histogram to the case of not using it. Note that the 
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number of edges for the wrench (W), gator (G) and hand (H) is 12, 50 and 67 
respectively. 
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The results are striking in a number of different ways. First, note that the 
maximum number of possible interpretations observed for any of the objects was 
20 (in the case of using the direction constraint), which is exceptionally low when 
considering that the total number of possible intepretations for the gator was 
50 3 or 312,500,000. Second, the median number of possible interpretations was 
only 2 for the wrench, and 4 for the gator and hand, when using the direction 
constraint. Without this constraint, the median number of interpretations rose to 
48,12 and 9 for the wrench, gator and hand, respectively. Of course, the results of 
the simulations will depend to a certain extent on the error ranges, a point that 
will be explored in some detail in the next section. We note that a tenth of an inch 
sensitivity in distance over a 10 to 20 inch range is within the range of current 
tactile sensors. The positioning accuracy of many current manipulators is within 
0.01 inches, and the Purbrick tactile sensor has a matrix element separation of 0.06 
inches, and the Hillis sensor has an element separation of 0.025 inches. 

4.2. Six Positional Freedoms 

When considering the full three-dimensional problem of objects with six degrees 
of freedom, we have run extensive simulations on the models illustrated in figure 7. 
The diameters of these objects (that is the maximum separation of two points on 
the object) were roughly 4, 7, 8 and 8 inches for the housing, stapler, simple hand 
and complex hand respectively. In running simulations of the recognition algorithm 
on these objects, we have used two different sensing strategies, reflecting in part 
the difference between range and tactile sensing capabilities. 

It should be noted that in all the following simulations, the efficiency of the 
tree pruning mechanism was improved by sorting the sensed points. In particular, 
rather than using the sensory data in arbitrary order, the points were sorted on the 
basis of pairwise separation, with the more distant points being ordered first. This 
sorting on distance tends to place the most effective constraints at the beginning 
of the process, a point that will be illustrated in Section 4.5. 

4.3. Grid Sensing 

In the first sensing method, the sensory data were generated by projecting a 
regular grid of points along three orthogonal directions, and noting where contact 
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Figure 7. 3D Test Models 
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was made with an arbitrarily oriented model of the object. This arbitrary orientation 
was obtained by randomly choosing values for the three Euler angles, computing 
a rotation transformation based on this and applying the rotation to the model. 
Note that this does not produce a uniform sampling of the space of rotations, but 
for our purposes it is a sufficiently random sampling. No translation offset was 
added, since this would not affect the process. The three-dimensional positions of 
the sensed points and the associated surface normals were then corrupted by noise 
within some specified bounds. For the simulations discussed below, the number of 
sensed points on each trial varied between 12 and 20. 

The results of the first set of simulations is shown in Tables II, III, and IV. 
Table II lists statistics of the number of interpretations in the tree following local 
pruning, for a variety of sensing accuracies. Each simulation consisted of 100 trials, 
and the minimum and maximum number of interpretations are recorded over this 
set of trials, as well as the 50th and 90th percentile of the distribution of number 
of interpretations. Table III lists statistics of the number of interpretations in the 
IT that survive an explicit model test. It was observed at this stage that while the 
number of interpretations was not reduced to 1, as might be expected, the surviving 
interpretations generally tended to differ only in one or two faces. Moreover, the 
computed transformation parameters were nearly identical, indicating that the 
multiple interpretations surviving a model test actually corresponded to a single 
interpretation, to within the error ranges of the algorithm. Thus, Table IV lists 
statistics of the number of separate transformations computed for each trial. In 
particular, transformations whose direction of rotation differed by more than 1.5° 
were judged to be different, yielding a very tight clustering of the computed 
transformations. This clustering ignores differences in the translation component, 
a point that is addressed later in Table VI. 
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Table 11 - No. of Interpretations After Local Pruning 

Object 

Normal 

Dist 

Min 

50th 

90th 

Max 

Faces 

Housing 

7t/15 

hh 

1 



1 EH 

40 



■9 

1 


■ 

H 

40 



.10 

2 

40 

208 

658 

40 


7T/10 

.01 

2 

4 

16 

240 

H 



.05 

2 

16 

40 

256 

■HI 



.10 

6 

96 

410 

1618 



7 t /8 


2 

8 

28 

96 

40 




2 

24 

108 

576 

40 



.10 

10 

156 

1144 

3576 

40 

Simple Hand 

7r/l 2 

.01 

4 

4 

8 

16 

28 



.05 

4 

8 

16 

24 

28 


7r/10 

.01 

4 

4 

12 

64 

28 



.05 

4 

8 

16 

96 

28 


7r/8 

.01 

4 

8 

16 


28 



.05 

4 

8 

24 

96 

28 



.01 

2 

8 

32 

72 

34 



.05 

2 

52 

204 

772 

34 




2 

14 

68 

276 

34 


■ I 

.05 


132 

1104 

2856 

34 

Complex Hand 

■H 

.01 

2 

24 

120 


64 




mm 

iXEFSii 


W 

64 


Wmnm 


mm 



3456 

64 



.05 

12 

144 

496 

4416 

64 


In the table above, the normal column lists the radius of the error cone about 
the measured surface normal; the dist column lists the error range of the distance 
sensing; the min and max columns list the minimum and maximum number of 
interpretations observed; the 50th column lists the median point of the set of 
simulations; the 90th column lists the 90 th percentile of the set of simulations; and 
the faces column lists the number of faces in the model. 
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Table III - No. of Inter 

eretations After Model Test 

Object 

Normal 

Dist 

Min 

50th 




Housing 

7t/15 

.01 

1 

2 

4 


40 



.05 

1 

4 

16 

36 

40 



.10 

1 

24 


384 

40 


UiU 

.01 

1 

2 

8 

120 

40 



.05 

1 


20 

228 

40 



.10 

3 


136 

434 

40 


7 r /8 

.01 


4 

14 

42 

40 



.05 


12 

44 

190 

40 




2 

57 

264 

936 

40 

Simple Hand 

tt/12 

.01 


2 

4 

8 

28 



.05 

2 

4 

7 

12 

28 


HDZE1 

.01 

2 

3 

8 

40 

28 



.05 

2 

4 

12 

40 

28 


mmsm 



4 

8 

10 

28 


mm 



4 

12 

48 

28 

Stapler 

tt/12 

.01 

1 

4 

18 

49 

34 



.05 

1 

30 

112 

386 

34 


7T/10 

.01 

1 

6 

36 

138 

34 



.05 

4 

68 

483 

2148 

34 

Complex Hand 

mam 

BE3I 

1 

12 

78 

448 

64 



— 

4 

92 

426 

WmSM 

64 


MSB 

.01 




■ 

64 



.05 



336 

2208 

64 


In the table above, the normal column lists the radius of the error cone about 
the measured surface normal; the dist column lists the error range of the distance 
sensing; the min and max columns list the minimum and maximum number of 
interpretations observed; the 50th column lists the median point of the set of 
simulations; the 90th column lists the 90 th percentile of the set of simulations; and 
the faces column lists the number of faces in the model. 
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Table IV - No. of Transforms After Clustering 

Object 

Normal 

Dist 

Min 

50th 

90th 

Max 

Faces 

Housing 

7r/15 

.01 

1 

1 

1 

2 

40 



.05 

1 

1 

1 

2 

40 



.10 

1 

1 

2 

12 

40 


7r/l0 

.01 

1 

1 

1 

2 

40 



.05 

1 

1 

1 

6 

40 



.10 

1 

1 

2 

6 

40 


7 t /8 

.01 

1 

1 

1 

2 

40 



.05 

1 

1 

2 

4 

40 



.10 

1 

1 

2 

6 

40 

Simple Hand 

7 t /12 

.01 

2 

2 

2 

4 

28 



.05 

2 

2 

2 

3 

28 


7T/10 

.01 

2 

2 

2 

4 

28 



.05 

2 

2 

2 

4 

28 


7 t /8 

.01 

2 

2 

3 

4 

28 



.05 

2 

2 

3 

4 

28 

Stapler 

tsD 

.01 

1 

1 

2 

4 

34 



.05 

1 

1 

2 

4 

34 


7T /10 

.01 

1 

1 

2 

4 

34 



.05 

1 

2 

3 

5 

34 

Complex Hand 

7 t /12 

.01 

1 

2 

3 

4 

64 



.05 

1 

2 

4 

6 

64 


tt /10 

.01 

1 

2 

4 

4 

64 



.05 

1 

2 

4 

7 

64 


In the table above, the normal column lists the radius of the error cone about 
the measured surface normal; the dist column lists the error range of the distance 
sensing; the min and max columns list the minimum and maximum number of 
interpretations observed; the 50th column lists the median point of the set of 
simulations; the 90th column lists the 90 th percentile of the set of simulations; and 
the faces column lists the number of faces in the model. 
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Figure 8. Simple Hand Histograms 



The first point to stress is that all of these numbers are remarkably low, given 
that the total number of possible interpretations of 15 sensed points on an object 
with 40 faces is roughly 1.07 4 X 10 24 . Thus, the local geometric constraints are 
very effective in reducing the combinatorics of feasible interpretations. 

As might be expected, the number of interpretations in all three tables tends to 
rise with increasing error in the measured parameters. The distributions also tend 
to be strongly clustered near the low end of the scale, with a very shallow tail on the 
high end of the distribution. Thus, while the maximum number of interpretations 
can be high (e.g. 3576 for surface normal error cone of tt/8 and distance error of 
0.10), the median point and even the 90th percentile of the distribution are generally 
much smaller. Sample distributions for the number of interpretations surviving tree 
pruning are shown in Figure 8. One reason that the maximum number of feasible 
interpretations can be significantly larger than the median of the distribution is the 
occasional occurrence of dependent sensor information. For example, if most of the 
sensed points happen to lie on a single face, the amount of independent information 
about the object s position is much smaller than when the same number of sensed 
points lie on different faces. While the sensing strategy used here will reduce the 
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the error cone about the sensed normal, in radians, and a bound on the magnitude 
of the position error, in inches. 


One reason that the maximum number of feasible interpretations can be 
significantly larger than the median of the distribution is the occasional occurrence 
of dependent sensor information. For example, if most of the sensed points happen 
to lie on a single face, the amount of independent information about the object’s 
position is much smaller than when the same number of sensed points lie on 
different faces. While the sensing strategy used here will reduce the probability 
of this occuring, there is still a nonzero chance of such redundant sensing taking 
place, resulting in an occasional case of a large number of feasible interpretations. 


The probability of such redundant sensing is also to a certain extent dependent 
on the shape of the object. For example, note that the aspect ratio of the stapler 
is much longer than that of the motor housing. This would tend to suggest that 
a regular sensing strategy is more likely to yield redundant information for the 
stapler than the housing. Indeed, a comparison of the appropriate sections of Table 
II shows that under similar conditions in measurement error, the number of feasible 
interpretations of the stapler is much higher than the motor housing, even though 
the stapler has fewer faces. This is partly due to redundant sensing and also partly 
due to symmetric interpretations of the data. 


The number of distinct transformations is almost always 1 in these simulations. 
It was also observed that the computed transformation was generally very close to 
the actual one. For example, each row of Table V illustrates the average error in the 
computed transformations over 100 runs of the algorithm. The direction column 
lists the average angle between the correct and the computed direction of rotation, 
the angle column lists the average angle between the correct and the computed 
magnitude of rotation about the rotation direction, and the translation column 
lists the average magnitude of the difference between the correct and the computed 
translation component of the transformation. It can be seen from the table that 
the average error is remarkably low, generally on the order of 2-3 degrees, even for 
different objects and different amounts of sensor error. As might be expected, the 
average error does tend to rise with increases in the sensor error. In no case did 
the algorithm discard the correct interpretation. Note that the errors illustrated in 
Table V were recorded from the difference between the correct transformation and 
the computed transformation for the correct interpretation. There will be other, 
erroneous interpretations, with much larger differences between the computed and 
correct transformation. 
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Table V - Average Errors in Computed Transformation | 

Object 

Normal 

Dist 

Direction (deg.) 

Angle (deg.) 


Simple Hand 

tt/12 

.01 

2.17 

2.33 

0.08 



.05 

2.08 

2.62 

0.09 


KHO 


3.13 

2.93 

0.11 




3.58 

3.15 

0.12 


7 t /8 

.01 

4.43 

3.64 

0.16 



.05 

5.26 

3.03 

0.17 


BOSH 

.05 

2.18 

2.17 

0.11 


HEiKKi 

.01 

3.42 

3.70 

0.12 



.05 

3.64 

3.22 

0.14 



.10 

3.77 

3.62 

0.20 


■Hi1 

.05 

4.28 

5.07 

0.19 

Stapler 

tt/12 


2.15 


0.11 


WBBu 




0.11 


In the few cases in which more than one transformation were found, two factors 
generally are observed. The first is that the noise in the measured data can result in 
transformations differing by only a few degrees, although these transformations are 
counted as being distinct. The second, more interesting, factor is the possibility of 
symmetric interpretations of the data, for example, due to a rotation of the object 
relative to the sensor. Consider first the case of a completely symmetric object, such 
as the simple hand, which has a rotational symmetry of 180°. Here, the algorithm 
always found at least two distinct transformations of the model that were consistent 
with the sensed data. For objects such as the motor housing, portions of the object 
are symmetric, for example, the base of the housing, ignoring the projecting lip. 
If all the sensed points happen to fall only on such a portion of the object, then 
symmetric interpretations of the data are possible. In general these symmetric 
interpretations account for most of the cases of multiple transformations, especially 
when the sensor error is small. The few remaining cases arise when the error in 
the measurements yields two nearly identical (i.e. differing by only a few degrees of 
rotation) transformations that account for the data. As the error in the measured 
data decreases, these multiple interpretations tend to disappear. 

The simulation data listed in Table IV is derived from a clustering of the 
interpretations based strictly on the rotation component of the transformation, 
that is, two transformations whose direction of rotation differed by less than 1.5° 
were considered to be part of the same cluster. This clustering technique, while 
very tight in the rotation component, ignores possible differences in the translation 
component of the transformation. To examine such differences, a number of the 
simulations were run, using a clustering of the interpretations with a rotation 
sensitivity of 1.5° and a translation sensitivity of either 0.05 or 0.01. The number 
of distinct transformations under this clustering scheme are indicated in Table VI. 
Note that while the number of distinct transformations does increase relative to 
the corresponding entries in Table IV, the change is not significant. 
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Table VI - I\ 

o. of Transforms After Clustering 

Object 

Normal 

Dist 

Cluster 

Min 

50th 

90th 

Max 

Housing 

7T/10 



1 

2 

4 

6 





1 

1 

2 

4 

Simple Hand 

7r/l0 

.05 

.01 

2 

K1 


16 




.05 

2 

in 


12 

Stapler 

Tr/10 

.01 



4 

14 

57 






2 

4 

32 

Complex Hand 

7T/10 

.01 


1 

5 

15 

20 




.05 

1 

3 

6 

16 


4.4. Random Sensing 


All of the previous simulations have generated the sensed data by projecting 
a regular grid of points along three orthogonal directions, generally resulting in 
between 12 and 20 contact points. Such a sensing strategy would be consistent with 
visual sensing modalities. A second set of simulations has been run using a sensing 
strategy more consistent with tactile sensors. Consider a set of three mutually 
\ orthogonal, directed rays, which intersect at a point. Suppose this point is taken 

to be some arbitrary point (x,y, 0), chosen on the x — y plane (note that by the 
definition of the object models, this plane will interect the object). Each ray is 
traced along is preferred direction, (with decreasing z component), until either the 
object or the support plane was contacted. This operation was repeated for several 
different approaches, using randomly generated values of x and y, until between 7 
and 9 different contact points were made on the object. Tables VII, VIII and EX 
summarize the results of running sets of simulations, using sensory data generated 
in this fashion. 


Table VII - No. of Interpretations After 

imcal Pruning j 

Object 

Normal 

Dist 

Min 

50th 

90th Max 

Faces 

Simple Hand 

tt/10 

.01 

2 

4 

20 

90 

28 



.05 

2 

8 

44 

300 

28 


7t/8 

.01 

2 

8 

48 

444 

28 



.05 

2 

12 

84 

320 

28 

Housing 

tt/10 

.01 

2 

10 

70 

946 

34 



.05 

2 

32 

124 

1234 

34 


7t/8 

.01 

2 

14 

74 

284 

34 



.05 

2 

62 

406 

4053 

34 


29 
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Table VIII - N 

o. of Interpretations After Model Test 

Object 

Normal 

Dist 

Min 

50th 

90th 



Simple Hand 

7r/10 

1 E9 

2 

4 

12 

60 

28 


■ ■ 

flES 

2 

4 

24 

116 

28 


mam 

.01 

2 

4 

24 

98 

28 




2 

7 

39 

160 

28 

Housing 

tm 

■iii 

mm 

4 

32 

516 

34 


mm 

WBM 

m 

16 

80 

606 

34 


tt/8 

.01 


8 

26 

136 

34 



.05 

i 

32 

164 

377 

34 


Table IX - No. of Transforms After Clustering on Rotation 

Object 

Normal 

Dist 

Min 

50th 

90th 



Simple Hand 

■EBm 



2 

4 

■eh 






2 

6 

n 

28 


7r/8 


2 

2 

- 6 

14 

28 




2 

3 

8 


28 

Housing 

7T/10 



1 

4 


ebb 





2 

7 




7r/8 

.01 

1 

1 

5 

9 

34 



.05 

1 

3 

10 

14 

34 


As in the case of the earlier simulations, the effectiveness of the local 
constraints in reducing the number of feasible interpretations is clearly demonstrated. 
Interestingly, the number of distinct transformations tends to be somewhat higher 
than the earlier cases, especially for the motor housing. This results in part from 
the following situation. With the exception of one projecting portion, (see Figure 
6), the housing is essentially a symmetric object, with respect to two different axes. 
As a consequence, if the sampled data points do not lie on this distinguishing 
projection, there could be several consistent, symmetric, interpretations of the data. 
In the case of sensory sampling on a regular grid of points, it is likely that at least 
one point will lie on this projection, and the symmetric ambiguity will not arise. 
In the case of fewer sample points, generated by random approaches to the object, 
it is much more likely that the feasible transformations will reflect this symmetry, 
and thus be higher in number. 

In cases of ambiguity in interpretation, for example, when several orientations 
of the motor housing are consistent with the sensed data, due to a partial symmetry 
of the object, it "would be useful to have effective means for distinguishing between 
the possible solutions. A straightforward method would be to add sensory points 
generated at random until only one interpretation is consistent. This, of course, 
could be very inefficient, since it could take the addition of several points before 
a solution is found. In the case of the motor housing, for example, one would 
need to consider additional sensory points until one lying on the projecting lip 
of the housing is recorded. A more effective solution is to use the difference in 
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feasible interpretations to find directions along which the points of contact of the 
different interpretations are widely separated. Such directions then constitute good 
candidates for generating the next sensed point [Gaston and Lozano-Perez 83]. 
Extensions of the method to the six degree of freedom problem are currently under 
investigation. 


4.5. Tree Pruning 


Tables X and XI contain a final set of statistics that demonstrates the 
effectiveness of the local contraints in reducing the number of feasible interpretations 
in the IT. The regular grid approach is used to generate the sensory data. For the 
data in Table X, the points are sampled in random order as the IT is generated 
and pruned. For the data in Table XI, the sensed points are sorted on the basis of 
pairwise separation, with the more distant points being ordered first. This sorting 
on distance tends to place the most effective constraints at the beginning of the 
process. Since the point of the local constraints is to prune the IT as efficiently as 
possible, applying the most effective constraints first should result in pruning out 
entire subtrees at as early a stage in the tree generation process as possible. Using 
the sorted sense data, the interpretation tree was generated and pruned. Tables X 
and XI list statistics of the number of interpretations at each level of the tree, (i.e. 
the number of /^-interpretations for different values of k ), based on trials of 100 
simulations each. 


Table X - Feasible Interpretations - Unsorted Points 

Points 

Min 

50th 

90th 

Max 

2 

12 

96 

334 

432 

3 

4 

110 

388 

678 

4 

4 

55 

373 

675 

5 

4 

40 

244 

1000+ 

6 

4 

26 

189 

1000+ 

7 

2 

24 

108 

686 

8 

2 

20 

82 

636 

9 

2 

20 

76 

520 

10 

2 

20 

72 

336 

11 

2 

16 

62 

280 

12 

2 

16 

64 

200 

13 

4 

20 

64 

304 

14 

2 

18 

72 

304 

15 

2 

20 

80 

304 
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Table XI - Feasible 

interpretations - Sorted Points 

Points 

Min 

50th 

90th 

Max 

2 

4 

18 

48 

94 

3 

2 

18 

38 

82 

4 

2 

12 

29 

68 

5 

2 

8 

22 

50 

6 

2 

8 

24 

58 

7 

2 

8 

24 

5S 

8 

2 

8 

24 

72 

9 

2 

8 

32 

72 

10 

2 

8 

24 

80 

11 

2 

8 

34 

192 

12 

2 

8 

36 

352 

13 

2 

12 

32 

512 

14 

2 

12 

40 

480 

15 

2 

12 

48 

288 


It can be seen that the median number of feasible interpretations Is quite small 
at all levels of the tree, even as the number of contact points is increased. This data 
implies that one of the strengths of the approach is the ability to prime out whole 
subtrees of the IT at a very early stage, thereby ensuring that the total number 
of tests to be applied is significantly smaller than the size of the entire tree. This 
leads to very efficient processing of the feasible interpretations. 

Sorting the points on distance is extremely effective as can be seen from the 
results reported in Table XI of the same set of runs as those in Table X, but where 
the points were sorted prior to pruning. The effect on running times of the pruning 
program is also quite drastic. 

5. Performance on Range Data 

We have performed limited testing of the algorithms described above using 
high-quality range data obtained from a laser-based triangulation system developed 
by Philippe Brou at our laboratory. Two samples of the data we used are shown in 
Figure 9. The data is obtained at high resolution, approximately 0.04 centimeter 
grid spacing along x and 0.08 centimeter along y. A small number of points were 
obtained from the dense data by choosing points where a least-squares fit to a 
plane over a 5 X 5 patch produced very low normalized residue errors. Points were 
chosen that included at least three independent normals. Note that the actual 
object includes a protrusion that was not present in the model; no data was taken 
from that region. In the data from figure 9(a), eleven points were used; in the data 
from figure 9(b) nine points were used. The accuracy bounds we employed were 
^0.02 inch position accuracy and accuracy in measuring the normal. 

Figure 9 shows the results obtained from running the algorithm on the data 
described above. There were only 9 and 11 interpretations, respectively, left in the 
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tree after pruning with the local constraints. Prom these, three valid transformations 
were found in one case and two in the other; they are shown in the figure. The 
correct transformation was found each time. The other transformations correspond 
to rotations that place the sensed points on parallel faces. Note, however, that 
disambiguations between the valid transformations would be straightforward once 
the transformations are known. 

The quality of the data used in the experiments illustrated in Figure 9 
corresponds to nearly the best error conditions used in the simulations. Results 
with larger error bounds, using data from sections where the data is less accurate, 
showed results similar to those in the simulations, i.e., more legal interpretations 
in the tree and more valid transformations but always including the correct one. It 
tends to reinforce the validity of the conclusions found in the simulations. 

6. The Combinatorics of Pruning the IT 

In the previous sections, we have outlined the basic interpretation algorithm. 
The crucial issue that determines the viability of this algorithm is the effectiveness 
of pruning the interpretation tree. Our goal has been to demonstrate that one can 
use simpleTocal constraints to prune the interpretation tree, so that only a few 
of the relatively expensive model tests need to be made. The simulation results, 
under a variety of conditions, and the results on range data provide support for 
this claim. 

It is also possible to provide a combinatorial analysis of the pruning of 
interpretation trees provided by local constraints. A detailed presentation of such 
an analysis is contained in a companion paper [Grimson and Lozano-Perez 83]. 
Here, we demonstrate the scope of the combinatorical analysis by presenting a 
detailed discussion of the use of the distance constraint in pruning interpretation 
trees. Similar results hold for the other constraints. 

We stress that the results given below are actually weak bounds on the number 
of interpretations to be expected after pruning. In practice, numbers close to these 
bounds are observed only when the sensors are arranged so as to obtain a minimum 
of information about the object. 

6.1. Combinatorics of Distance Pruning 

We will consider the case in which all faces (or edges in the two-dimensional 
case) have the same size, and derive bounds on the expected pruning of the IT. 

Assume we have some arbitrary labelling of the faces from 1 to n (for example, 
in the two-dimensional case, based on arc length from some starting point). For 
each pair of faces, i and j, let dij denote the separation of the midpoints of the 
faces. Let e t y be an upper bound on the range of variation in distance, for different 
sensed points on the two faces, i.e. let 

x — y| < - f- e, Vx on face i, Vy on face j 


C t j 


lim sup je: dij 


< 
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where |x — y| is the distance between point x on face i and point y on face j. Let e 
be defined as the maximum over all i,j of plus some estimate of the maximum 
error of the sensed distance. 


Now assume that we have recorded the position of two sensor points, P\ and 
P 2 , and let S 12 be the measured distance between them. Assume that the first point 
has been arbitrarily assigned to some face i of the object. We want to determine 
how many faces j of the object can consistently be assigned to the second point, 
given the separation S 12 and the known distribution of distances. Moreover, we 
want to be able to continue this for k sensor points, determining an upper bound 
on the number of assignments of faces to sensor points that are consistent with the 
sensed separation between the faces. 

Let the distribution of faces with respect to face i as a function of distance be 
denoted by pi(s). In other words, p t -(s) records the number of faces whose midpoint 
separation from face i is given by the distance s. As a consequence, 

Lo dpi(s) = n 

where n is the total number of faces, and d is the diameter, or maximum separation 
of the object. Note that because dpi is a distribution, this is a Lebesgue-Stieltjes 
integral. The following bound on the number of nodes at the k th level of the IT 
holds for both two-dimensional and three-dimensional objects. 

Proposition 1: An upper bound on the expected number of nodes at the k th 
level of the interpretation tree, k > 2, is given by 



where d is the diameter of the object, and e is a bound on the distance sensitivity 
of the model. 


Proof: The proof proceeds by considering an iterative application of the 
expected maximum branching factor at each level of the tree. We assume that 
bk—x denotes a bound on the number of consistent nodes at the k — 1 st level of the 
interpretation tree, and consider the branching factor obtained when adding a k th 
sensed point. Assume that sensor point Pk—i has been assigned to face i, and that 
the measured separation of sensor point Pk—i and Pk is Sk- This implies that the 
midpoint separation of the corresponding faces is within e of s&. Hence, an upper 
bound on the number of possible faces consistent with s^, given face i assigned to 
point Pk—i is 


f dpi(s k + *)• 

J X~ — € 


Since the number of nodes at the k — 1 st level of the tree is bounded by bk—i, an 
upper bound on the total number of nodes at the k ih level of the tree is 


b k - 




-! max / dpi{sk + z). 

i »/£=- " 


35 



Grimson & Lozano-P4rez 


Model-Based Recognition 








We now wish to determine a bound on the expected number of nodes, evaluated 
over the range of possible values for s^. If 'I'(s) denotes the distribution of sensed 
distances, then an upper bound on the expected number of nodes is 


[ d 

Js=o 


t’k 1 max, {’dp,U + -0 


/to ■»(»)• 


If we know which object is being sensed, we could derive an explicit form for d'k(s). 
Since we are considering the case of sensing from a set of possible objects, the 
best we can do is consider the distribution of sensed distances over all possible 
orientations of all objects, and this is best given by a uniform distribution. Thus, 


d' k(s) = -ds 
d 

and an upper bound on the expected number of nodes becomes 

max [ f dpAs -j- x) ds. 

d i Js— 0 Jx ~—£ 

Note that this double integration can essentially be considered as a counting 
problem. That is, we want to count the number of faces whose separation from 
the sampled face lies in an e-range about some point s, with this number being 
accumulated over all possible e-ranges (i.e. vary the midpoint s). Reversing the 
order of integration basically reverses the order of counting. Thus, rather than 
counting the number of faces lying within a range, and summing over the set of 
ranges, we count the number of ranges in which each face is included, and sum 
over the number of faces. Clearly each face can be counted in at most 2e ranges 
(as the midpoint of the range moves from s — e to s -f- e), and the total number of 
faces is n. Thus, the branching factor at this level yields the iterative expression 


h = h—i 


2 en 
~d' 


The base case of k = 1 yields the bound of b\ = n since the initial assignment of 
the point Pi is arbitrary. 


Evaluation of the iterative expression yields 


h-gt 

thereby concluding the proof by induction. | 


While this proposition gives us an upper bound on the expected number of 
nodes, in order to evaluate it we need some estimate on e. The following two 
propositions provide this for the two- and three-dimension cases. 

Proposition 2: If all the edges of a two-dimensional object have the same length 
e, then Vz, j C{j < e. 

Proof: Connect the midpoints of two arbitrary faces, i and j, with a line of 
length dij. Consider first the case of dij > e. The set of all possible orientations 
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Figure 10. Illustration for Proof of Proposition 2 



of each of the edges about its midpoint describes a circle of radius f about that 
endpoint. We are interested in the extrema in separation of points in these disks 
(see Figure 10). We claim that the maximum and minimum separation of points in 
the disks occur for the case of the edges parallel to the midpoint connector, giving 
a minimum of d tJ - — e and a maximum of dij -j- e. 

While this can be shown algebraically, there is also a simple geometric proof. 
Construct a coordinate system with origin at the midpoint of edge i and with x axis 
along the midpoint connector. Now construct a circle of radius d l3 — e about the 
point (dij — §, 0). Clearly, this circle grazes the first disk at the point (§, 0). Now, in 
order for any other point in the second disk to have a shorter distance, we must be 
able to position a circle of the same radius about that point and still intersect the 
first disk. This is not possible, by the following argument. The envelope of possible 
points can be formed by sweeping a circle of radius d^ — e through a series of 
positions such that the center of the circle lies at the limit of the second disk. This 
envelope only intersects the first disk at the above mentioned point, and hence, the 
minimum possible separation between the two edges is given by dij — e. 

Similarly, the maximum separation can be shown to be dij + e by constructing 
a circle of radius dij -j- e about the point (dij -j- |, 0) and using the same argument. 

If d^ < e, then the minimum distance is clearly bounded below by 0. The 
construction for the bound on the maximum distance is identical to that above. 
Hence, we see that e tJ < | 


Corollary: If all the edges of a two-dimensional object have' the same length, 
and the sensor error in measuring distances is much smaller than the length of an 
edge, then the expected number of nodes at the k th level of the interpretation tree, 
k > 2, that survive distance pruning is bounded above by 

(?r« 

where p is the perimeter of the object, and d is its diameter. 

Proof: Since the sensor error is much less than the edge length, we see that e is 
essentially given by the maximum over all i,j of From the previous proposition, 
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/—S 


this is bounded by the edge size e, and since all the edges are of the same length, 
e — The corollary follows naturally. g 

Note that for convex objects, p < ird, so that the bound becomes linear in n 

(27t)* 1 n. 

In general, the perimeter for non-convex objects can be much larger. We note, 
however, that for highly convoluted objects, if sensing is along straight lines, then 
much of the perimeter of the object is “invisible” to the sensor. This follows from 
the observation that sensing at such a face would require sensing through some 
other portion of the object. Thus, in practice, the perimeter term in the above 
expression for non-convex objects, should be replaced by an “effective perimeter”, 
which will generally correspond to the perimeter of a nearly-convex object. 

Proposition 3: If all the faces of a three-dimensional object have the same 
diameter e, and the same area Af then Vz, j eij < e. 

Proof: The proof is almost identical to the two-dimensional case. Here the 
geometric construction consists of two spheres of radius | centered about the 
endpoints of a line of length d t j, and we seek the minimum and maximum 
separations of points on the two spheres. As in the previous case, a geometric 
construction shows that the extremal cases occur when the diameters of the faces 
are parallel to the midpoint connnectors, and hence e x j < e. | 

Corollary: If all the faces of a three-dimensional object have the same diameter 
and the same surface area, and the sensor error in measuring distances is much 
smaller than the diameter of a face, then the expected number of nodes at the k th 
level of the interpretation tree, k > 2, that survive distance pruning is bounded 
above by 



where A is the surface area of the object and d is its diameter. 

Proof: Since the sensor error in measuring distance is much less than the 
diameter of a face, we see that e is essentially given by the maximum over all i,j 
of From the previous proposition, this is bounded by the face diameter e. If 
Af is the surface area of the face, then Af > 7r(§) 2 . Moreover, Af — so that 

e < e < 2^“ and the corollary follows. | 

If the object is convex, then the area A is bounded above by tt d 2 , and the 
upper bound reduces to 

4 k ~ x n^. 
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As in the two-dimensional case, non-convex objects can essentially be treated as 
convex ones, where the surface area of a convoluted object is replaced by the 
“effective surface area” of a nearly-convex one, and a similar bound will hold. 

6.2. The Relevance of the Combinatorics 

The key point to be stressed here is that the use of distance pruning can be 
shown to reduce the interpretation problem significantly. In principle, the problem of 
k sensor points against a model of n faces would result in n k possible interpretations 
that must be tested. We have shown that for two-dimensional objects, distance 
pruning reduces this to a number linear in n, and for three-dimensional objects, 
the number is reduced to at most one proportional to 

We also stress that this is a weak upper bound, in particular because the 
analysis does not consider the full constraint of distance pruning. The analysis given 
considers the sequential pruning obtained by iteratively applying the constraint 
imposed by the sensed distance between the (k -f- l) s< sensed point and the k th 
one. Clearly, given k sensed points, there are different distance constraints, 
and taking all of these into account should provide a tighter bound. Moreover, the 
bounds derived refer to the pruning due to a single type of constraint. Clearly, 
when all three constraints are used, we would expect the number of possible 
interpretations to be further reduced. 

It was a surprise to the authors that weak upper bounds on the number of 
interpretations would be less than exponential in the number of sensed points, k (for 
example in the three degree of freedom case, where the number of interpretations 
is linear in the number of sensed points). In our experience, how'ever, many people 
find it surprising that any of the bounds should grow with k. Most people expect 
them to decrease with k, i.e., as more points are acquired, the constraint should be 
tighter. Recall, however, that the bounds derived above do not take into account 
the fact that there are ( 2 ) distance constraints at the k th level of the tree; they 
only apply a single constraint at each level of the tree. There is another important 
effect that (partially) accounts for the growth in the number of interpretations 
with k. Namely, that for k < 6 each interpretation corresponds to a continuous 
range of positions and orientations. For example, for k = 1, each interpretation 
corresponds to the whole space of positions and orientations. As more points are 
added, the “volume” in the space of positions and orientations consistent with each 
interpretation decreases, but the number of these interpretations may increase (as 
they do between k — 1 and k — 2) 2 . 

7. Discussion 

It is important to note that the algorithm described in this paper has quite 
low computational cost. The pruning algorithm is particularly efficient. The range 
tables store all the model information needed and pruning is done by simply 


2 We are indebted to John Canny for this observation. 
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comparing the ranges of values measured (plus or minus error estimates) with those 
in the tables. Therefore, no arithmetic is done during pruning (except for indexing 
into tables). It is only the model test that requires any significant computation and, 
therefore, the desire to minimize the number of times it must be performed. 

To illustrate this point, we have recorded actual run times for a number of 
simulations. While the times are clearly dependent on a number of factors, such 
as the type of machine, the specific algorithm, the object sensed, and so on, the 
order of magnitude of the run times helps illustrate the computational efficiency of 
the method. For example, using an implementation in Lisp running on a Symbolics 
3600 Lisp Machine, simulations on the motor housing with angular error range of ^ 
and positional error range of 0.05 took an average of 1.27 seconds to generate and 
prune the interpretation tree and an average of 3.17 seconds to perform the model 
check. The time required to generate and prune the tree is clearly dependent on 
the number of plausible interpretations and grows non-linearly with an increase in 
this number. The time required to perform model checking grows linearly with the 
number of interpretations to which such a check must be applied. The average time 
expended on each model check was 0.24 seconds. In general, the average time to 
complete the computation was under 5 seconds, for this particular implementation, 
although this number would occasionally be exceeded in sensing situations in which 
a large number of interpretations were possible. 

The local constraint method developed here requires that all the sensory data 
be drawn from one object. This is difficult to guarantee, in the tactile or visual 
domain, when the object is in a bin among other objects. Of course, if a hypothesis 
is made that all the points belong to one object and no feasible interpretations 
are found, then one can tell that the hypothesis is wrong. Much more research is 
needed in this area, however. 

Throughout the paper we have limited our attention to the number of 
interpretations, relative to one model, of data obtained from that object. To 
carry out recognition between several objects, one determines the number of legal 
interpretations of one set of data relative to multiple object models. This process 
can simply be performed sequentially on each model. One simple improvement is 
clearly possible. If one stores with each model the maximum distance between any 
of the faces, then if one of the measured distances is greater than this upper bound, 
the model can be discarded at once. This technique quickly separates large objects 
from small ones. Unfortunately, very small measured distances do not rule out 
large objects. A second method would be to use direction histograms to rule out 
certain models. For example, if the angle between two sensed normals was 30°, 
then a model of a cube would not be consistent with this data, and couldl quickly 
be excluded. 

After generating and pruning the interpretation tree and performing the model 
test on each of the known objects, we have a listing of all the positions and 
orientations of all objects consistent with the measured data. At this point, further 
discrimination can be carried out by additional unguided sensing as before or 
by considering the alternatives and choosing a good place to sense next. The 
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recognition problem that remains is now amenable to other techniques as well since 
it has been reduced to the much more tractable problem of differentiating among 
a class of objects in known positions and orientations. 
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Appendix I 

Here, we establish the claim of section 2.1.2 that the set 
{ni -n 2 | n! -ui > e lf n 2 •u 2 > e 2 } 

is contained in the set 

{ni • H2 | cos [min(7r, 0 12 -f fa + fa)] < ni • n 2 < cos [max(0, 0 12 — fa -j- <fo)j} 
where 
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Figure 11. Extremal values of dot products between two cones 



cos <j>\ — ei 

COS <j)2 — 62 
COS 0\2 — Ui • u 2 = 7 . 

While it is possible to prove this algebraically, it is simpler to see this by the 
following geometric construction (see Figure 11). We wish to determine the extremal 
values of the dot product between unit vectors in the two cones, or equivalently, 
extremal values in the angle between any two such vectors. If the cones about ui 
and u 2 intersect, clearly the maximum value of the dot product is 1. If the cones 
are antipodal, clearly the minimum value is —1. 

We now consider cases in which the cones do not overlap. We claim that the 
extremal values for the dot product occur when the two vectors lie in the plane 
spanned by ui and u 2 , with the vectors lying at the limits of the cone within this 
plane. That is, if we let 


then the extrema occur at 

ni 
n 2 

and 

n i = (ei +7/9i)ui — />iu 2 
n 2 = —p 2 ui -f (e 2 + 7p 2 )u 2 

The first case can be shown to correspond to the minimal angle between vectors 
in the two cones, by the following construction. Construct a cone centered about 
Hi with radius such that n 2 lies on the boundary of the cone, that is the new 



= ( e i — 7Pi)«i + Piu 2 
— P 2 U 1 + (e 2 — 7 p 2 )u 2 
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cone grazes the U 2 cone at 02 . If there is a smaller angle, it must be possible to 
reposition this cone so that it is centered at some other point in the ui cone and 
yet still intersects the U 2 cone. This is clearly not possible, and hence the minimum 
value of the dot product is given by the stated choice of n* and n 2 - Expanding the 
dot product for this case, and making the appropriate trigonometric substitutions 
yields the required expression. A similar construction holds for the maximum angle 
(or minimum dot product). 

Appendix II 

Here, we show how to compute the range of possible direction vectors between 
face y and facej in the object model. Let us erect a coordinate system on face * at 
the centroid of the face and whose z axis points along the normal of the face. Then, 
it is clear that the set of possible direction vectors is the set 

{vy — Vj I vy £ facej & v* £ face { } 

where both v 1 and \j are expressed relative to the frame on face{. Assume, for 
now, that both faces are convex. It can be shown [Lozano-Perez 83] that this set is 
equivalent to 

ch({vj —v* | vy £ vert(facej ) & v t - £ verffacef)}) 

where ch{ ) is the convex hull of a set of points and vertQ is the set of vertices of a 
face. Because of convexity, the extrema of the component of the direction vectors 
along the normal of face i occur at the vertices of this convex hull. Clearly, the 
vertices of the convex hull of a set of points are drawn from the set of points itself. 
Therefore, we need only find the extrema of the finite set 

{ny • (vy — Vi) | vy £ vert(facej) & Vj £ vert(facef)} 
where ni is the normal to face 

When the faces are non-convex, the procedure above will generate a conservative 
bound. 
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